{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "8b0e0905",
   "metadata": {},
   "source": [
    "# Speech to Text\n",
    "\n",
    "This example will show you how to use GPTCache and OpenAI to implement speech to text, i.e. to turn audio into text. Where the OpenAI model will be used to turn the audio data, and GPTCache will cache the generated text so that the next time the same or similar audio is requested, it can be returned directly from the cache, which can improve efficiency and reduce costs.\n",
    "\n",
    "This bootcamp is divided into three parts: how to initialize gptcache, running the openai model to turn audio data into text, and finally showing how to start the service with gradio. You can also find the colab in here. You can also try this example on [Google Colab](https://colab.research.google.com/drive/1vtiD-emu9gJdzPq0c5dwGOV2ngmug9EN?usp=share_link)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11b0f837",
   "metadata": {},
   "source": [
    "## Initialize the gptcache\n",
    "\n",
    "Please [install gptcache](https://gptcache.readthedocs.io/en/latest/index.html#) first, then we can initialize the cache. There are two ways to initialize the cache, the first is to use the map cache (exact match cache) and the second is to use the database cache (similar search cache), it is more recommended to use the second one, but you have to install the related requirements.\n",
    "\n",
    "Before running the example, make sure the `OPENAI_API_KEY` environment variable is set by executing `echo $OPENAI_API_KEY`. If it is not already set, it can be set by using `export OPENAI_API_KEY=YOUR_API_KEY` on Unix/Linux/MacOS systems or `set OPENAI_API_KEY=YOUR_API_KEY` on Windows systems."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46326449",
   "metadata": {},
   "source": [
    "### 1. Init for exact match cache\n",
    "\n",
    "`cache.init` is used to initialize gptcache, the default is to use map to search for cached data, `pre_embedding_func` is used to pre-process the data inserted into the cache, and it will use the `get_file_bytes` method, more configuration refer to [initialize Cache](https://gptcache.readthedocs.io/en/latest/references/gptcache.html#module-gptcache.Cache)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "bbd4d14f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# from gptcache import cache\n",
    "# from gptcache.adapter import openai\n",
    "# from gptcache.processor.pre import get_file_bytes\n",
    "\n",
    "# cache.init(pre_embedding_func=get_file_bytes)\n",
    "# cache.set_openai_key()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f3ebda74",
   "metadata": {},
   "source": [
    "### 2. Init for similar match cache\n",
    "\n",
    "When initializing gptcahe, the following four parameters are configured:\n",
    "\n",
    "- `pre_embedding_func`: pre-processing before extracting feature vectors, it will use the `get_file_name` method\n",
    "- `embedding_func`: the method to extract the text feature vector\n",
    "- `data_manager`: DataManager for cache management\n",
    "- `similarity_evaluation`: the evaluation method after the cache hit\n",
    "\n",
    "The `data_manager` is used to audio feature vector, response text in the example, it takes [Milvus](https://milvus.io/docs) (please make sure it is started), you can also configure other vector storage, refer to [VectorBase API](https://gptcache.readthedocs.io/en/latest/references/manager.html#module-gptcache.manager.vector_data)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "db7e64ae",
   "metadata": {},
   "outputs": [],
   "source": [
    "from gptcache import cache\n",
    "from gptcache.adapter import openai\n",
    "from gptcache.processor.pre import get_file_name\n",
    "\n",
    "from gptcache.embedding import Data2VecAudio\n",
    "from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation\n",
    "from gptcache.manager import get_data_manager, CacheBase, VectorBase, ObjectBase\n",
    "\n",
    "\n",
    "data2vec = Data2VecAudio()\n",
    "cache_base = CacheBase('sqlite')\n",
    "vector_base = VectorBase('milvus', host='localhost', port='19530', dimension=data2vec.dimension)\n",
    "data_manager = get_data_manager(cache_base, vector_base)\n",
    "\n",
    "cache.init(\n",
    "    pre_embedding_func=get_file_name,\n",
    "    embedding_func=data2vec.to_embeddings,\n",
    "    data_manager=data_manager,\n",
    "    similarity_evaluation=SearchDistanceEvaluation(),\n",
    "    )\n",
    "cache.set_openai_key()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4201d301",
   "metadata": {},
   "source": [
    "## Run openai speech to text \n",
    "\n",
    "Then run `openai.Audio.translations`, which can translate and transcribe the audio into english, and it is based on large-v2 Whisper model. The file uploads are currently limited to 25 MB and the following input file types are supported: mp3, mp4, mpeg, mpga, m4a, wav, and webm.\n",
    "\n",
    "Note that `openai` here is imported from `gptcache.adapter.openai`, which can be used to cache with gptcache at request time. Please download the [blues.00000.mp3](https://github.com/towhee-io/examples/releases/download/data/blues.00000.mp3) before running the following code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "7555c46b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "One bourbon, one scotch and one bill Hey Mr. Bartender, come here I want another drink and I want it now My baby she gone, she been gone tonight I ain't seen my baby since night of her life One bourbon, one scotch and one bill\n"
     ]
    }
   ],
   "source": [
    "audio_file= open(\"./blues.00000.mp3\", \"rb\")\n",
    "transcript = openai.Audio.transcribe(\"whisper-1\", audio_file)\n",
    "print(transcript[\"text\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02d816d5",
   "metadata": {},
   "source": [
    "## Start with gradio\n",
    "\n",
    "Finally, we can start a gradio application to translate and transcribe the audio.\n",
    "\n",
    "First define the `speech_to_text` method, which is used to generate text based on the input audio and also return whether the cache hit or not. Then start the service with gradio, as shown below:\n",
    "\n",
    "![](../assets/speech_to_text_gradio.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "b160e01f",
   "metadata": {},
   "outputs": [],
   "source": [
    "def speech_to_text(audio_file):\n",
    "    audio_file= open(audio_file, \"rb\")\n",
    "    transcript = openai.Audio.transcribe(\"whisper-1\", audio_file)\n",
    "    return transcript[\"text\"], transcript.get(\"gptcache\", False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d6cbf624",
   "metadata": {},
   "outputs": [],
   "source": [
    "import gradio\n",
    "\n",
    "interface = gradio.Interface(speech_to_text, \n",
    "                             gradio.Audio(source=\"upload\", type=\"filepath\"),\n",
    "                             [gradio.Textbox(label=\"transcript\"), gradio.Textbox(label=\"is hit\")]\n",
    "                            )\n",
    "\n",
    "interface.launch(inline=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f2b5959d",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
